Time domain watermarking of multimedia signals

ABSTRACT

A method of generating a watermark signal, embedding the watermark signal within a multimedia signal, and subsequently detecting the watermark signal is described. The watermark signal is the sum of two sequences of values, the second sequence of values being a circularly shifted version of the first sequence.

This application is a national Stage entry of International ApplicationNo. PCT/IB03/00785.

The present invention relates to apparatus and methods for encoding anddecoding information in multimedia signals, such as audio, video or datasignals.

Watermarking of multimedia signals is a technique for the transmissionof additional data along with the multimedia signal. For instance,watermarking techniques can be used to embed copyright and copy controlinformation into audio signals.

The main requirement of a watermarking scheme is that it is notobservable (i.e. in the case of an audio signal, it is inaudible) whilstbeing robust to attacks to remove the watermark from the signal (e.g.removing the watermark will damage the signal). It will be appreciatedthat the robustness of a watermark will normally be a trade off againstthe quality of the signal in which the watermark is embedded. Forinstance, if a watermark is strongly embedded into an audio signal (andis thus difficult to remove) then it is likely that the quality of theaudio signal will be reduced.

Various types of audio watermarking schemes have been proposed, eachwith its own advantages and disadvantages. For instance, one type ofaudio watermarking scheme is to use temporal correlation techniques toembed the desired data (e.g. copyright information) into the audiosignal. This technique is effectively an echo-hiding algorithm, in whichthe strength of the echo is determined by solving a quadratic equation.The quadratic equation is generated by auto-correlation values at twopositions: one at delay equal to τ, and one at delay equal to 0. In sucha scheme, as echoes of the audio signal are added to the original audiosignal, the resulting signal is in fact both an amplitude and a phasemodulated version of the original audio signal. At the detector, thewatermark is extracted by determining the ratio of the auto correlationfunction at the two delay positions.

This correlation technique has a number of drawbacks. For instance, itis only possible to embed the watermark where the resulting quadraticequation has real roots, and consequently this reduces the robustness(ability of the watermark to withstand attacks) for a given audioquality. Further, the performance of the correlation algorithm isdependent upon the value of the delay τ and the characteristics of theoriginal signal. This is a significant drawback.

Also known are watermarking schemes based on the amplitude modulation ofDFT (Discrete Fourier Transform) coefficients. As such schemes requirethe calculation of DFTs at both the encoder and the decoder, theresulting hardware for implementing such DFT schemes tends to berelatively complex, and hence the scheme tends to be slow to perform andcostly. Further, watermarks cannot be satisfactorily embedded in audiosegments that have sparse frequency characteristics, and hence the DFTscheme does not work well with particular types of music.

WO 00/00969 describes an alternative technique for embedding or encodingauxiliary signals (such as copyright information) into a multimedia hostor cover signal. A replica of the cover signal, or a portion of thecover signal in a particular domain (time, frequency or space), isgenerated according to a stego key, which specifies modification valuesto the parameters of the cover signal. The replica signal is thenmodified by an auxiliary signal corresponding to the information to beembedded, and inserted back into the cover signal so as to form thestego signal.

At the decoder, in order to extract the original auxiliary data, areplica of the stego signal is generated in the same manner as thereplica of the original cover signal, and requires the use of the samestego key. The resulting replica is then correlated with the receivedstego signal, so as to extract the auxiliary signal. The extraction ofthe auxiliary signal is thus relatively complex, and requires the stegokey at both the encoder (or embedder) and decoder (or detector).Additionally, a brute force search is required to synchronize to theauxiliary signal at the detector.

Further, performance of the payload extraction is dependent on how wellthe auxiliary signal can be estimated. In a system with a high expectederror rate of the payload bits in the auxiliary signal, this is verydifficult to achieve. Solutions would lead to very complex errorcorrection methods, or significantly limit the information capacity.

It is an object of the present invention to provide a watermarkingscheme that substantially addresses at least one of the problems of theprior art.

In a first aspect, the present invention provides a method of generatinga watermark signal for embedding in a multimedia signal, the methodcomprising the steps of: (a) generating two sequences of values, thesecond sequence being a circularly shifted version of the firstsequence; and (b) generating a watermark signal by adding the values ofthe first sequence to the respective values in the correspondingpositions of the second sequence.

Preferably, each value of the first and second sequences is representedby a pulse of preferable width T_(s) so as to form rectangular wavesignals.

Preferably, in step (a) a window shaping function is applied to converteach of the rectangular signals into respective smoothly varyingsignals, with the resulting smoothly varying signals being added in step(b) to form the watermark signal.

Preferably, each one of said sequences of values is convolved with awindow shaping function which has a width of at least T_(s), so as togenerate two smoothly varying signals, these smoothly varying signalsbeing added together in step (b) so as to form the watermark signal.

Preferably, said window shaping function has a band limited frequencybehavior and a smooth temporal behavior.

Preferably, said window shaping function has a symmetric oranti-symmetric temporal behavior.

Preferably, said window shaping function comprises at least one of araised cosine function and a bi-phase function.

Preferably, the watermark signal is generated by the addition of the twosmoothly varying signals with a relative delay of T_(r), whereT_(r)<T_(s).

Preferably, T_(r) is chosen such that maximum amplitude points of thefirst smoothly varying signal coincide with zero-crossings of the secondsmoothly varying signal, and vice-versa.

Preferably, said watermark signal has a payload that is encoded in thecombination of said two sequences of values.

In another aspect, the present invention provides an apparatus arrangedto generate a watermark signal for embedding in a multimedia signal, theapparatus comprising: (a) a sequence generator arranged to use a firstsequence of values to generate a second sequence of values, the secondsequence being a circularly shifted version of the first sequence; and(b) a signal generator arranged to generate a watermark signal by addingthe values of the first sequence to the respective values in thecorresponding positions of the second sequence.

Preferably, the apparatus further comprises a signal conditionerarranged to convert each sequence of values into a smoothly varyingsignal.

Preferably, the apparatus is arranged to generate said first sequence ofvalues by circularly shifting a primary sequence of values.

In a further aspect, the present invention provides a method ofembedding a watermark in a multimedia signal, the method comprising thesteps of: (a) generating a watermark signal equal to the sum of twosequences of values, the second sequence being a circularly shiftedversion of the first sequence of values; (b) generating a host modifyingmultimedia signal as a product of the watermark signal and themultimedia signal; (c) generating a watermarked multimedia signal byadding a scaled version of said host modifying multimedia signal to themultimedia signal.

Preferably, said scaled version of the host modifying signal isgenerated by controlling the scaling factor by a predeterminedcost-function.

Preferably, said cost function comprises multiple scaling factors, eachscaling factor being defined separately for one or more of the pluralityof frequency bands in the multimedia signal.

Preferably, said frequency bands are determined according to a model ofthe human auditory and/or visual system.

Preferably, in step (b) said host modifying multimedia signal isgenerated by multiplying said watermark signal with an extracted portionof the multimedia signal.

Preferably, said extracted portion of the multimedia signal is obtainedby filtering at least a portion of the multimedia signal with respect toat least one of frequency, space and time.

The method preferably further comprises the steps of: (d) generating asecond watermark signal equal to the sum of a third and a fourthsequences of values, the fourth sequence being a circularly shiftedversion of the third sequence of values; (e) extracting a second portionof the multimedia signal, the second portion being filtered such that itdoes not overlap with said first portion; (f) generating a watermarkedmultimedia signal by adding the product of the second watermark signaland the second extracted portion of the multimedia signal to thewatermarked multimedia signal.

In another aspect the present invention provides an apparatus arrangedto embed a watermark signal in a multimedia signal, the apparatuscomprising; (a) a watermark generator arranged to generate a signalequal to the sum of two sequences of values, the second sequence being acircularly shifted version of the first sequence of values; (b) anoutput signal generator arranged to generate a watermarked multimediasignal by adding the product of the watermark signal and the multimediasignal to the multimedia signal.

Preferably, the apparatus further comprises a signal extractor arrangedto extract a first portion of the multimedia signal.

In a further aspect the present invention provides a multimedia signalcomprising a watermark, wherein the original multimedia signal has beenwatermarked by modifying the temporal envelope of the original signal bythe watermark, the watermark comprising the sum of a first and a secondsequences of values, the second sequence of which is a circularlyshifted version of the first sequence.

In another aspect the present invention provides a method of detecting awatermark signal embedded in a multimedia signal, the method comprisingthe steps of: (a) receiving a multimedia signal that may potentially bewatermarked by a watermark signal modifying the temporal envelope of thehost multimedia signal; (b) extracting an estimate of the watermark fromsaid received signal; and (c) correlating the estimate of the watermarkwith a reference version of the watermark so as to determine whether thereceived signal was watermarked.

Preferably, the watermark signal has a payload, and the method furthercomprises the step of determining the payload of the watermark.

In a further aspect, the present invention provides a watermark detectorapparatus arranged to detect whether a watermark signal is embeddedwithin a multimedia signal, the watermark detector comprising: (a) areceiver arranged to receive a multimedia signal that may potentially bewatermarked by a watermark signal modifying the temporal envelope of thehost multimedia signal; (b) an extractor arranged to extract an estimateof the watermark from said received signal; and (c) a correlatorarranged to correlate the estimate of the watermark with a referenceversion of the watermark so as to determine whether the received signalwas watermarked.

Preferably, the apparatus further comprises a payload detector arrangedto determine if a payload is present within said watermark and todetermine the value of said payload.

For a better understanding of the invention, and to show how embodimentsof the same may be carried into effect, reference will now be made, byway of example, to the accompanying diagrammatic drawings in which:

FIG. 1 is a diagram illustrating a watermark embedding apparatus inaccordance with an embodiment of the present invention;

FIG. 2 shows a signal portion extraction filter H used in one preferredembodiment;

FIGS. 3 a and 3 b show respectively the typical amplitude and phaseresponses as a function of frequency of the filter H used in FIG. 2;

FIG. 4 shows the payload embedding and watermark conditioning stage;

FIG. 5 is a diagram illustrating the details of the watermarkconditioning apparatus H_(c) of FIG. 4, including charts of theassociated signals at each stage;

FIGS. 6 a and 6 b show two preferred alternative window shapingfunctions s(n) in the form of respectively a raised cosine function anda bi-phase function;

FIGS. 7 a and 7 b show respectively the frequency spectra for awatermark sequence conditioned with a raised cosine and a bi-phaseshaping window function;

FIG. 8 is a diagram illustrating a watermark detector in accordance withan embodiment of the present invention;

FIG. 9 diagrammatically shows the whitening filter H_(w) of FIG. 8, foruse in conjunction with a raised cosine shaping window function;

FIG. 10 diagrammatically shows the whitening filter H_(w) of FIG. 8, foruse in conjunction with a bi-phase window shaping function; and

FIG. 11 shows a typical shape of the correlation function output fromthe correlator of the watermark detector shown in FIG. 8.

FIG. 1 shows a block diagram of the apparatus required to perform thedigital signal processing for embedding a multi-bit payload watermarkw_(c) into a host signal x in accordance with a preferred embodiment tothe present invention.

A host signal x is provided at an input 12 of the apparatus. The hostsignal x is passed in the direction of output 14 via the adder 22.However, a replica of the host signal x (input 8) is split off in thedirection of the multiplier 18, for carrying the watermark information.

The watermark signal w_(c) is obtained from the payload embedder andwatermark conditioning apparatus 6, and derived from the watermarkrandom sequence w_(s) (input 4), which is input to the payload embedderand watermark conditioning apparatus. The multiplier 18 is utilized tocalculate the product of the watermark signal w_(c) and the replicaaudio signal x. The resulting product, w_(c)x is then passed via a gaincontroller 24 to the adder 22. The gain controller 24 is used to amplifyor attenuate the signal by a gain factor α.

The gain factor α controls the trade off between the audibility and therobustness of the watermark. It may be a constant, or variable in atleast one of time, frequency and space. The apparatus in FIG. 1 showsthat, when α is variable, it can be automatically adapted via a signalanalyzing unit 26 based upon the properties of the host signal x.Preferably, the gain a is automatically adapted, so as to minimize theimpact on the signal quality, according to a properly chosenperceptibility cost-function, such as a psycho-acoustic model of thehuman auditory system (HAS). Such a model is, for instance, described inthe paper by E. Zwicker, “Audio Engineering and Psychoacoustics:Matching signals to the final receiver, the Human Auditory System”,Journal of the Audio Engineering Society, Vol. 39, pp. Vol. 115-126,March 1991.

In the following, an audio watermark is utilized, by way of exampleonly, to describe this embodiment of the present invention.

The resulting watermark audio signal y is then obtained at the output 14of the embedding apparatus 10 by adding an appropriately scaled versionof the product of w_(c) and x to the host signal:y[n]=x[n]+αw _(c) [n]x[n].  (1)

Preferably, the watermark w_(c) is chosen such that when multiplied withx, it predominantly modifies the short time envelope of x.

FIG. 2 shows one preferred embodiment in which the input 8 to themultiplier 18 in FIG. 1 is obtained by filtering the replica of the hostsignal x using a filter H in the filtering unit 15. If the filter outputis denoted by x_(b), then according to this preferred embodiment, thewatermarked signal is generated by adding an appropriately scaledversion of the product of x_(b) and the watermark w_(c) to the hostsignal x.

Let x _(b) be defined such that x _(b)=x−x_(b), and y_(b) be definedsuch that y=y_(b)+ x _(b), then the watermarked signal y can be writtenasy[n]=(1+w _(c) [n])x _(b) [n]+ x _(b) [n].  (2)and the envelope modulated portion y_(b) of the watermarked signal y isgiven asy _(b) [n]=(1+w _(c) [n])x _(b) [n]  (3)

Preferably, as shown in FIG. 3, the filter H is a linear phase band-passfilter characterized by its lower cut off frequency f_(L) and upper cutoff frequency f_(H). As can be seen in FIG. 3 b, the filter H has alinear phase response with respect to frequency f within the pass band(BW). Thus, when H is a band-pass filter, x_(b) and x _(b) are thein-band and out-of-band components of the host signal respectively. Foroptimum performance, it is preferable that the signals x_(b) and x _(b)are in phase. This is achieved by appropriately compensating for thephase distortion produced by filter H.

In FIG. 4, the details of the payload embedder and watermarkconditioning unit 6 is shown. In this unit the watermark seed signalw_(s) is converted into a multi-bit watermark signal w_(c).

Firstly a finite length, preferably zero mean and uniformly distributedrandom sequence w_(s) is generated using a random number generator withan initial seed S. As will be appreciated later, it is preferable thatthis initial seed S is known to both the embedder and the detector, suchthat a copy of the watermark signal can be generated at the detector forcomparison purposes. This results in the sequence of length L_(w)w_(s) [k]∈[−1,1], for k=0,1,2, . . . , L_(w)−1  (4)

Then the sequence w_(s) is circularly shifted by the amounts d₁ and d₂using the circularly shifting units 30 to obtain the random sequencesw_(d1) and w_(d2) respectively. It will be appreciated that these twosequences (w_(d1) and w_(d2)) are effectively a first sequence and asecond sequence, with the second sequence being circularly shifted withrespect to the first. Each sequence w_(di), i=1,2, is subsequentlymultiplied with a respective sign bit r_(i), in the multiplying unit 40,where r_(i)=+1 or −1, the respective values of r₁ and r₂ remainingconstant, and only changing when the payload of the watermark ischanged. Each sequence is then converted into a periodic, slowly varyingnarrow-band signal w_(i) of length L_(w)T_(s) by the watermarkconditioning circuit 20 shown in FIG. 4. Finally, the slowly varyingnarrow-band signals w₁ and w₂ are added with a relative delay T_(r)(where T_(r)<T_(s)) to give the multi-bit payload watermark signalw_(c). This is achieved by first delaying the signal w₂ by the amountT_(r) using delaying unit 45 and subsequently by adding it to w₁ withthe adding unit 50.

FIG. 5 shows in more detail the watermark conditioning apparatus 20 usedin the payload embedder and watermark conditioning apparatus 6. Thewatermark random sequence w_(s) is input to the conditioning apparatus20.

For convenience, the modification of only one of the sequences w_(di) isshown in FIG. 5, but it will be appreciated that each of the sequencesis modified in a similar manner, with the results being added to obtainthe watermark signal w_(c).

As shown in FIG. 5, each watermark signal sequence w_(di)[k], i=1,2 isapplied to the input of a sample repeater 180. Chart 181 illustrates oneof the possible sequences w_(di) as a sequence of values of randomnumbers between +1 and −1, with the sequence being of length L_(w). Thesample repeater repeats each value within the watermark random sequenceT_(s) times, so as to generate a pulse train signal of rectangularshape. T_(s) is referred to as the watermark symbol period andrepresents the span of the watermark symbol in the audio signal. Chart183 shows the results of the signal illustrated in chart 181 once it haspassed through the sample repeater 180.

A window shaping function s[n], such as a raised cosine window, is thenapplied to convert the rectangular pulse signals derived from W_(d1) andw_(d2) into slowly varying signals w₁[n] and w₂[n] respectively.

Chart 184 shows a typical raised cosine window shaping function, whichis also of period T_(s).

The generated signals w₁[n] and w₂[n] are then added up with a relativedelay T_(r) (where T_(r)<T_(d) to give the multi-bit payload watermarksignal w_(c)[n] i.e.w _(c) [n]=w ₁ [n]+w ₂ [n−T _(r)]  (5)

The value of T_(r) is chosen such that the zero crossings of w₁ matchthe maximum amplitude points of w₂ and vice-versa. Thus, for a raisedcosine window shaping function T_(r)=T_(s)/2, and for a bi-phase windowshaping function T_(r)=T_(s)/4. For other window shaping functions,other values of T_(r) are possible.

As will be appreciated by the below description, during detection thewatermarked signal carrying w_(c)[n] will generate two correlation peaksthat are separated by pL (as can be seen in FIG. 11). The value pL ispart of the payload, and is defined as

$\begin{matrix}{{pL} = {{{d_{2} - d_{1}}}{{mod}\left( \left\lceil \frac{L_{w}}{2} \right\rceil \right)}}} & (6)\end{matrix}$

In addition to pL, extra information can be encoded by changing therelative signs of the embedded watermarks.

In the detector, this is seen as a relative sign r_(sign) between thecorrelation peaks. It will be seen that r_(sign) can take four possiblevalues, and may be defined as:

$\begin{matrix}{r_{sign} = {\frac{{2 \cdot \rho_{1}} + \rho_{2} + 3}{2} \in \left\{ {0,1,2,3} \right\}}} & (7)\end{matrix}$where ρ₁=sign(cL₁) and ρ₂=sign(cL₂) are respectively estimates of thesign bits r₁ (input 80) and r₂ (input 90) of FIG. 4, and cL₁ and cL₂ arethe values of the correlation peak corresponding to w_(d1) and w_(d2)respectively. The overall watermark payload pL_(w) is then given as acombination of r_(sign) and pL:pL _(w)=

r _(sign) , pL

.  (8)

The maximum information (I_(max)), in number of bits, that can becarried by a watermark sequence of length L_(w) is thus given by:

$\begin{matrix}{I_{\max} = {{\log_{2}\left( {4 \cdot \left\lceil \frac{L_{w}}{2} \right\rceil} \right)}\mspace{14mu}{bits}}} & (9)\end{matrix}$

In such a scheme, the payload is immune to relative offset between theembedder and the detector, and also to possible time scalemodifications. The window shaping function has been identified as one ofthe main parameters that controls the robustness and audibility behaviorof the present watermarking scheme. As illustrated in FIGS. 6 a and b,two examples of possible window shaping functions are herein described—araised cosine function and a bi-phase function.

It is preferable to use a bi-phase window function instead of a raisedcosine window function, so as to obtain a quasi DC-free watermarksignal. This is illustrated in FIGS. 7 a and 7 b, which show thefrequency spectra corresponding to a watermark sequence (in this case asequence of w_(di)[k]={1,1,−1,−1,−1,}) conditioned with respectively araised cosine and a bi-phase window shaping function. As can be seen,the frequency spectrum for the raised cosine conditioned watermarksequence has a maximum at frequency f=0, whilst the frequency spectrumfor the bi-phase shaped watermark sequence has a minimum at f=0 i.e. ithas very little DC component.

Useful information is only contained in the non-DC component of thewatermark. Consequently, for the same added watermark energy, awatermark conditioned with the bi-phase window will carry more usefulinformation than one conditioned by the raised cosine window. As aresult, the bi-phase window offers superior audibility quality for thesame robustness or, conversely, it allows a better robustness for thesame audibility quality.

Such a bi-phase function could also be utilized as a window shapingfunction for other watermarking schemes. In other words, a bi-phasefunction could be applied to reduce the DC component of signals (such asa watermark) that are to be incorporated into another signal.

FIG. 8 shows a block diagram of a watermark detector (200, 300, 400).The detector consists of three major stages: (a) the watermark symbolextraction stage (200), (b) the buffering and interpolation stage (300),and (c) the correlation and decision stage (400).

In the symbol extraction stage (200), the received watermarked signaly′[n] is processed to generate multiple (N_(b)) estimates of thewatermarked sequence, which are multiplexed into the signal w_(e)[m].These estimates of the watermark sequence are required to resolve anytime offset that may exist between the embedder and the detector, sothat the watermark detector can synchronize to the watermark sequenceinserted in the host signal.

In the buffering and interpolation stage (300), these estimates arede-multiplexed into N_(b) separate buffers. An interpolation issubsequently applied to each buffer to resolve possible timescalemodifications that may have occurred. For instance, a drift in sampling(clock) frequency may result in a stretch or shrink in the time domainsignal (i.e. the watermark may have been stretched or shrunk).

In the correlation and decision stage (400), the content of each bufferis correlated with the reference watermark and the maximum correlationpeaks are compared against a threshold to determine the likelihood ofwhether the watermark is indeed embedded within the received signaly′[n].

In order to maximize the accuracy of the watermark detection, thewatermark detection process is typically carried out over a length ofreceived signal y′[n] that is 3 to 4 times that of the watermarksequence length. Thus each watermark symbol to be detected can beconstructed by taking the averages of several symbols. This averagingprocess is referred to as smoothing, and the number of times theaveraging is done is referred to as the smoothing factor s_(f). Thus,the detection window length L_(D) is the length of the audio segment (innumber of samples) over which a watermark detection truth-value isreported. Consequently, L_(D)=s_(f)L_(w)T_(s), where T_(s) is the symbolperiod and L_(w) the number of symbols within the watermark sequence.Typically, the length (L_(b)) of each buffer 320 within the bufferingand interpolation stage is L_(b)=s_(f)L_(w).

In the watermark symbol extraction stage 200 shown in FIG. 8, theincoming watermark signal y′[n] is input to the signal conditioningfilter H_(b)(210). This filter 210 is typically a band pass filter andhas the same behavior as the corresponding filter H (15) shown in FIG.2. The output of the filter H_(b) is y′_(b)[n], and assuming linearitywithin the transmission channel, it follows from equations (2) and (3):y′ _(b) [n]≈y _(b) [n]=(1+αw _(c) [n])x _(b) [n]  (10)

Note that when no filter is used in the embedder (i.e., when H=1) thenH_(b) in the detector can also be omitted, or it can still be includedto improve the detection performance. If H_(b) is omitted, then y_(b) inequation (10) is replaced with y. The rest of the processing is thesame.

For simplification, it is assumed that there is perfect synchronismbetween the embedder and the detector (i.e. no offset and no change intimescale), and that the audio signal is divided into frames of lengthT_(s), and that y′_(b,m)[n] is the n-th sample of the m-th frame of thefiltered signal y′_(b)[n]. It should be noted that if there is notperfect synchronism between the embedder and the detector, then anydeviation can be compensated for within the buffering and interpolationstage 300 utilizing techniques known to the skilled person e.g.iteratively searching through all possible scale and offsetmodifications until a best match is achieved.

The energy E[m] corresponding to the y′_(b,m)[n] frame is:

$\begin{matrix}{{E\lbrack m\rbrack} = {\sum\limits_{n = 0}^{T_{s} - 1}{{{y_{b,m}^{\prime}\lbrack n\rbrack}{S\lbrack n\rbrack}}}^{2}}} & (11)\end{matrix}$where S[n] is the same window shaping function used in the watermarkconditioning circuit of FIG. 5. A person skilled in the art willappreciate that equation 11 represents a matched filter receiver, and isthe optimum receiver when the symbol period is perfectly synchronized.Not withstanding this fact, from now on, we set S[n]=1 in order tosimplify subsequent explanations.

Combining this with equation 10, it follows that:

$\begin{matrix}{{{E\lbrack m\rbrack} \approx {\sum\limits_{n = 0}^{T_{s} - 1}{{y_{b,m}\lbrack n\rbrack}}^{2}}} = {\sum\limits_{n = 0}^{T_{s} - 1}{{\left( {1 + {\alpha\;{w_{e}\lbrack m\rbrack}}} \right){x_{b,m}\lbrack n\rbrack}}}^{2}}} & (12)\end{matrix}$where w_(e)[m] is the m-th extracted watermark symbol. and containsN_(b) time-multiplexed estimates of the embedded watermark sequences.Solving for w_(e)[m] in equation 12 and ignoring higher order terms ofα, gives the following approximation:

$\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\alpha}\left( {\frac{\sum\limits_{n = 0}^{T_{s} - 1}{{y_{b,m}\lbrack n\rbrack}}^{2}}{\sum\limits_{n = 0}^{T_{s} - 1}{{x_{b,m}\lbrack n\rbrack}}^{2}} - 1} \right)}} & (13)\end{matrix}$

In the watermark extraction stage 200 shown in FIG. 8, the outputy′_(b)[n] of the filter H_(b) is provided as an input to a frame divider220, which divides the audio signal into frames of length T_(s) i.e.into y′_(b,m)[n], with the energy calculating unit 230 then being usedto calculate the energy corresponding to each of the framed signals asper equation (11). The output of this energy calculation unit 230 isthen provided as an input to the whitening stage H_(w) (240) whichperforms the function shown in equation 13 so as to provide an outputw_(e)[m].]. Alternative implementations (240A, 240B) of this whiteningstage are illustrated in FIGS. 9 and 10.

It will be realized that the denominator of equation 13 contains a termthat requires knowledge of the host signal x. As the signal x is notavailable to the detector, it means that in order to calculate w_(e)[m]then the denominator of equation 13 must be estimated.

Below is described how such an estimation can be achieved for the twodescribed window shaping functions (the raised cosine window shapingfunction and the bi-phase window shaping function), but it will equallybe appreciated that the teaching could be extended to other windowshaping functions.

In relation to the raised cosine window shaping function shown in FIG. 6a, it has been realized that the audio envelope induced by the watermarkcontributes predominantly to the noisy part of the energy function E[m].The slowly varying part (i.e. the low frequency components) ispredominately due to the contribution of the envelope of the originalaudio signal x. Thus, equation 13 may be approximated by:

$\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\alpha}\left( {\frac{E\lbrack m\rbrack}{{lowpass}\left( {E\lbrack m\rbrack} \right)} - 1} \right)}} & (14)\end{matrix}$where “lowpass (.)” is a low pass filter function. Thus, it will beappreciated that the whitening filter H_(w) for the raised cosine windowshape in the function can be realized as shown in FIG. 9.

As can be seen, such a whitening filter H_(w) (240A) comprises an input242A for receiving the signal E[m]. A portion of this signal is thenpassed through the low pass filter 247A to produce a low pass filteredenergy signal E_(LP)[m], which in turn is provided as an input to thecalculation stage 248A along with the function E[m]. The calculationstage 248A then divides E[m] by E_(LP)[m] to calculate the extractedwatermark symbol w_(e)[m].

When a bi-phase window function is employed in the watermarkconditioning stage of the embedder, a different approach should beutilized to estimate the envelope of the original audio, and hence tocalculate w_(e)[m].

It will be seen by examination of the bi-phase window function shown inFIG. 6 b, that when the envelope of an audio frame is modulated withsuch a window function, the first and the second halves of the frame arescaled in opposite directions. In the detector, this property isutilized to estimate the envelope energy of the host signal x.

Consequently, within the detector, the audio frame is first sub-dividedinto two halves. The energy functions corresponding to the first andsecond halve frames are hence given by

$\begin{matrix}{{E_{1}\lbrack m\rbrack} = {\sum\limits_{n = 0}^{{T_{s}/2} - 1}{{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}\mspace{14mu}{and}}}} & (15) \\{{E_{2}\lbrack m\rbrack} = {\sum\limits_{n = {T_{s}/2}}^{T_{s} - 1}{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}}} & (16)\end{matrix}$respectively. As the envelope of the original audio is modulated inopposite directions within the two sub-frames, the original audioenvelope can be approximated as the mean of E₁[m] and E₂[m].

Further, the instantaneous modulation value can be taken as thedifference between these two functions. Thus, for the bi-phase windowfunction, the watermark w_(e)[m] can be approximated by:

$\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\alpha}\left( {\frac{{E_{1}\lbrack m\rbrack} - {E_{2}\lbrack m\rbrack}}{{E_{1}\lbrack m\rbrack} + {E_{2}\lbrack m\rbrack}} - 1} \right)}} & (17)\end{matrix}$

Consequently, the whitening filter H_(w) 240B for a bi-phase windowshaping function can be realized as shown in FIG. 10. Inputs 242B and243B respectively receive the energy functions of the first and secondhalve frames E₁[m] and E₂[m]. Each energy function is then split up intotwo, and provided to adders 245B and 246B which respectively calculateE₁[m]−E₂[m], and E₁[m]+E₂[m]. Both of these calculated functions arethen passed to the calculating unit 248B which divides the value fromadder 245B by the value from 246B so as to calculate the watermarkw_(e)[m], in accordance with equation 17.

This output w_(e)[m] is then passed to the buffering and interpolationstage 300, where the signal is de-multiplexed by a de-multiplexer 310,buffered in buffers 320 of length L_(b) so as to resolve any lack ofsynchronism between the embedder and the detector, and interpolatedwithin the interpolation unit 330 so as to compensate for any time scalemodification between the embedder and the detector. Such compensationcan utilize known techniques, and hence is not described in any moredetail within this specification.

As shown in FIG. 8, outputs (W_(D1), W_(D2), . . . W_(DNb)) from thebuffering stage are passed to the interpolation stage and, afterinterpolation, the outputs (w_(I1), w_(I2), . . . W_(INb)) of thisstage, which correspond to the different estimates of the correctlyre-scaled signal, are passed to the correlation and decision stage. Ifit is believed that no time scaling compensation is required, the values(W_(D1), W_(D2), . . . W_(DNb)) can be passed directly to thecorrelation and decision stage 400 i.e. the interpolation stage 330 canbe omitted from the apparatus.

The correlator 410 calculates the correlation of each estimate w_(1j),j=1, . . . ,N_(b) with respect to the reference watermark sequencew_(s)[k]. Each respective correlation output corresponding to eachestimate is then applied to the maximum detection unit 420 whichdetermines which two estimates provided the best fits for the circularlyshifted versions W_(d1) and w_(d2) of the reference watermark. Thecorrelation values (the peak amplitudes and positions) for theseestimate sequences are passed to the threshold detector and payloadextractor unit 430.

If the interpolation stage is omitted, alternatively the correlator 410calculates the correlation of each estimate w_(Dj), j=1, . . . ,N_(b)with the reference watermark sequence w_(s)[k] and the results arepassed on for subsequent processing to the units 420 and 430 as outlinedin the above paragraph.

The threshold detector and payload extractor unit 430 may be utilized toextract the payload (e.g. information content) from the detectedwatermark signal. Once the unit has estimated the two correlation peakscL₁ and cL₂ that exceed the detection threshold, the distance pL betweenthe peaks (as defined by equation (6)) is measured. Next, the signs ρ₁and ρ₂ of the correlation peaks are determined, and hence r_(sign)calculated from equation (7). The overall watermark payload may then becalculated using equation (8).

For instance, it can be seen in FIG. 11 that pL is the relative distancebetween the two peaks. Both peaks are positive i.e. ρ₁=+1, and ρ₂=+1.From equation (7), r_(sign)=3. Consequently, the payload pL_(w)=<3, pL>.

The reference watermark sequence w, used within the detector correspondsto (a possibly circularly shifted version of) the original watermarksequence applied to the host signal. For instance, if the watermarksignal was calculated using a random number generator with seed S withinthe embedder, then equally the detector can calculate the same randomnumber sequence using the same random number generation algorithm andthe same initial seed so as to determine the watermark signal.Alternatively, the watermark signal originally applied in the embedderand utilized by the detector as a reference could simply be anypredetermined sequence.

FIG. 11 shows a typical shape of a correlation function as output fromthe correlator 410. The horizontal scale shows the correlation delay (interms of the sequence bins). The vertical scale on the left hand side(referred to as the confidence level cL) represents the value of thecorrelation peak normalized with respect to the standard deviation ofthe (typically normally distributed) correlation function.

As can be seen, the typical correlation is relatively flat with respectto cL, and centered about cL=0. However, the function contains twopeaks, which are separated by pL (see equation 6) and extend upwards tocL values that are above the detection threshold when a watermark ispresent. When the correlation peaks are negative, the above statementapplies to their absolute values.

A horizontal line (shown in the FIG. as being set at cL=8.7) representsthe detection threshold. The detection threshold value controls thefalse alarm rate.

Two kinds of false alarms exist: the false positive rate, defined as theprobability of detecting a watermark in non watermarked items, and thefalse negative rate, which is defined as the probability of notdetecting a watermark in watermarked items. Generally, the requirementof the false positive alarm is more stringent than that of the falsenegative. The right hand side scale on FIG. 11 illustrates theprobability of a false positive alarm p. As can be seen, in the exampleshown, the probability of a false positive p=10⁻¹² is equivalent to thethreshold cL=8.7, whilst p=10⁻⁸³ is equivalent to cL=20.

After each detection interval, the detector determines whether theoriginal watermark is present or whether it is not present, and on thisbasis output a “yes” or a “no” decision. If desired, to improve thisdecision making process, a number of detection windows may beconsidered. In such an instance, the false positive probability is acombination of the individual probabilities for each detection windowconsidered, dependent upon the desired criteria. For instance, it couldbe determined that if the correlation function has two peaks above athreshold of cL=7 on any two out of three detection intervals, then thewatermark is deemed to be present. Obviously, such detection criteriacan be altered depending upon the desired use of the watermark signaland to take into account factors such as the original quality of thehost signal and how badly the signal is likely to be corrupted duringnormal transmission.

It will be appreciated by the skilled person that variousimplementations not specifically described would be understood asfalling within the scope of the present invention. For instance, whilstonly the functionality of the embedding and detecting apparatus has beendescribed, it will be appreciated that the apparatus could be realizedas a digital circuit, an analog circuit, a computer program, or acombination thereof.

Equally, whilst the above embodiment has been described with referenceto an audio signal, it will be appreciated that the present inventioncan be applied to other types of signal, for instance video and datasignals.

Within the specification it will be appreciated that the word“comprising” does not exclude other elements or steps, that “a” or “and”does exclude a plurality, and that a single processor or other unit mayfulfil the functions of several means re-cited in the claims.

1. A method of generating a watermark signal for embedding in amultimedia signal, the method comprising: generating two sequences ofvalues, the second sequence being a circularly shifted version of thefirst sequence; and generating a watermark signal by adding the valuesof the first sequence to the respective values in the correspondingpositions of the second sequence, wherein each value of the first andsecond sequences is represented by a pulse of width T_(s) so as to formrectangular wave signals.
 2. The method as claimed in claim 1, wherein awindow shaping function is applied to convert each of the rectangularpulse train signals into respective smoothly varying signals, with theresulting smoothly varying signals being added to form the watermarksignal.
 3. A method of generating a watermark signal for embedding in amultimedia signal, the method comprising: generating two sequences ofvalues, the second sequence being a circularly shifted version of thefirst sequence; and generating a watermark signal by adding the valuesof the first sequence to the respective values in the correspondingpositions of the second sequence, wherein each one of said sequences ofvalues is convolved with a window shaping function which has a width ofat least T_(s), so as to generate two smoothly varying signals, thesesmoothly varying signals being added together so as to form thewatermark signal.
 4. The method as claimed in claim 3, wherein saidwindow shaping function has a band limited frequency behavior and asmooth temporal behavior.
 5. The method as claimed in claim 4, where thewindow shaping function has a symmetric or anti-symmetric temporalbehavior.
 6. The method as claimed in claim 3, wherein said windowshaping function comprises at least one of a raised cosine function anda bi-phase function.
 7. The method as claimed in claim 3, wherein thewatermark signal is generated by the addition of the two smoothlyvarying signals with a relative delay of T_(r), where Tr<T_(s).
 8. Themethod as claimed in claim 7, wherein T_(r) is chosen such that maximumamplitude points of the first smoothly varying signal coincide withzero-crossings of the second smoothly varying signal, and vice-versa. 9.The method as claimed in claim 1, wherein said watermark signal has apayload that is encoded in the combination of said two sequences ofvalues.
 10. An apparatus arranged to generate a watermark signal forembedding in a multimedia signal, the apparatus comprising: a sequencegenerator arranged to use a first sequence of values to generate asecond sequence of values, the second sequence being a circularlyshifted version of the first sequence; a signal generator arranged togenerate a watermark signal by adding the values of the first sequenceto the respective values in the corresponding positions of the secondsequence; and a signal conditioner arranged to convert each sequence ofvalue into a smoothly varying signal.
 11. The apparatus as claimed inclaim 10, wherein the apparatus is arranged to generate said firstsequence of values by circularly shifting a primary sequence of values.12. A method of embedding a watermark in a multimedia signal, the methodcomprising: generating a watermark signal equal to the sum of twosequences of values, the second sequence being a circularly shiftedversion of the first sequence of values; generating a host modifyingmultimedia signal as a product of the watermark signal and themultimedia signal; and generating a watermarked multimedia signal byadding a scaled version of said host modifying multimedia signal to themultimedia signal, wherein said scaled version of the host modifyingsignal is generated by controlling the scaling factor by a predeterminedcost-function.
 13. The method as claimed in claim 12, wherein said costfunction comprises multiple scaling factors, each scaling factor beingdefined separately for one or more of the plurality of frequency bandsin the multimedia signal.
 14. The method as claimed in claim 13, whereinsaid frequency bands are determined according to a model of the humanauditory and/or visual system.
 15. The method as claimed in claim 12,wherein said host modifying multimedia signal is generated bymultiplying said watermark signal with an extracted portion of themultimedia signal.
 16. The method as claimed in claim 15, wherein saidextracted portion of the multimedia signal is obtained by filtering atleast a portion of the multimedia signal with respect to at least one offrequency, space and time.
 17. The method as claimed in claim 12,wherein the method further comprises: generating a second watermarksignal equal to the sum of a third and a fourth sequences of values, thefourth sequence being a circularly shifted version of the third sequenceof values; extracting a second portion of the multimedia signal, thesecond portion being filtered such that it does not overlap with saidfirst portion; generating a watermarked multimedia signal by adding theproduct of the second watermark signal and the second extracted portionof the multimedia signal to the watermarked multimedia signal.
 18. Amethod of detecting a watermark signal embedded in a multimedia signal,the method: receiving a multimedia signal that may potentially bewatermarked by a watermark signal modifying the temporal envelope of thehost multimedia signal; applying a window shaping function to saidreceived signal; extracting an estimate of the watermark from saidreceived signal; and correlating the estimate of the watermark with areference version of the watermark so as to determine whether thereceived signal is watermarked.
 19. The method as claimed in claim 18,wherein the watermark signal has a payload, and the method furthercomprises determining the payload of the watermark.